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DETAILED ACTION 

Claim Objections 

1 . Claims 1 -5 are objected to because of the following informalities: 

In claim 1, line 7, "a phoneme cluster" should be changed to —the phoneme cluster— in 
order to provide proper antecedent basis. Claims 2-5 are objected because they are 
dependent on objected claim 1. 

Claim Rejections - 35 USC §103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

Claims 1-15 are rejected under 35 U.S.C. 103 (a) as being unpatentable over Kao 
et al. (U.S. Patent: 6,317,712 Bl), hereinafter referred as Kao, in view of Yan (U.S. 
Patent: 6,789,063 Bl). 

2. As per claim 1, Kao teaches a speech processing method comprising: receiving 
speech signals (Kao, figure 3, subblock 1 1); processing the received speech signals (Kao, 
figure 3, subblock 12 and 13); to generate a plurality of phoneme clusters (Kao, figure 3, 
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subblock 14); grouping the phoneme clusters into a first cluster node and a second cluster 
node (Kao, figure 3, subblock 14; figure 4). 



Kao does not explicitly teach determining automatically if a phoneme cluster in the first 
cluster node is to be moved into the second cluster node based on a likelihood increase of 
the phone cluster of the first cluster node from being in the first cluster node to being in 
the second cluster node. However, Yan teaches determining automatically if a phoneme 
cluster in the first cluster node is to be moved into the second cluster node based on a 
likelihood increase of the phone cluster of the first cluster node from being in the first 
cluster node to being in the second cluster node (Yan, col. 1, lines 40-58 and col.3, lines 
5-8). ("A set of states can be recursively partitioned into subsets according to the 
answers to the questions at each node when traversing the tree from the root node to its 
leaf nodes. All states that reach the same leaf nodes are considered similar and are 
clustered together. The tree construction is a top-down data driven process based on a 
one-step greedy tree growing algorithm. The goodness-of-split criterion is based on 
maximum likelihood (ML) of the training data. Initially all corresponding HMM states 
of all triphones that share the same basic phone are pooled in the root node and the log- 
likelihood of the training data is calculated based on the assumption that all the states in 
the node are tied. This node is then split into two by the question that gives the 
maximum increase in log-likelihood of the training data when partitioning the states in 
the node. This process is repeated until the increase falls below a threshold", Yan, col. 1, 
lines 40-58, "During the decision tree construction, if all the data associated with all the 
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states in a node is less than a threshold, the node is no longer split and becomes a leaf 
node", Yan, col.3, lines 5-8). 

Kao and Yan are analogous art because they are from a similar field of endeavor 
in speech processing and large vocabulary speech recognition applications. Thus, it 
would have been obvious to one of ordinary skill in the art at the time the invention was 
made to implement the teachings of Yan into Kao since Kao teaches a speech processing 
method comprising: receiving speech signals (Kao, figure 3, subblock 1 1); processing the 
received speech signals (Kao, figure 3, subblock 12 and 13); to generate a plurality of 
phoneme clusters (Kao, figure 3, subblock 14); grouping the phoneme clusters into a first 
cluster node and a second cluster node (Kao, figure 3, subblock 14; figure 4) and Yan 
teaches determining automatically if a phoneme cluster in the first cluster node is to be 
moved into the second cluster node based on a likelihood increase of the phone cluster of 
the first cluster node from being in the first cluster node to being in the second cluster 
node (Yan, col. 1, lines 40-58 and col.3, lines 5-8), in order to improve the decision-tree 
based acoustic modeling to better use the training data and thereby to improve the 
accuracy and robustness of the clustered acoustic models. ("A set of states can be 
recursively partitioned into subsets according to the answers to the questions at each node 
when traversing the tree from the root node to its leaf nodes. All states that reach the 
same leaf nodes are considered similar and are clustered together. The tree construction 
is a top-down data driven process based on a one-step greedy tree growing algorithm. 
The goodness-of-split criterion is based on maximum likelihood (ML) of the training 
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data. Initially all corresponding HMM states of all triphones that share the same basic 
phone are pooled in the root node and the log-likelihood of the training data is calculated 
based on the assumption that all the states in the node are tied. This node is then split 
into two by the question that gives the maximum increase in log-likelihood of the training 
data when partitioning the states in the node. This process is repeated until the increase 
falls below a threshold", Yan, col. 1, lines 40-58, "During the decision tree construction, 
if all the data associated with all the states in a node is less than a threshold, the node is 
no longer split and becomes a leaf node", Yan, col.3, lines 5-8). 

3. As per claim 2, Kao, in view of Yan, teaches the speech processing method as 
claimed in claim 1, further comprising: moving the phoneme cluster in the first cluster 
node into the second cluster node if the first cluster node is determined to be moved into 
the second cluster node (Yan, col.3, lines 5-8). ("During the decision tree construction, if 
all the data associated with all the states in a node is less than a threshold, the node is no 
longer split and becomes a leaf node", Yan, col.3, lines 5-8). 

4. As per claim 3, Kao, in view of Yan, teaches the speech processing method as 
claimed in claim 2, wherein moving the phoneme cluster in the first cluster node into the 
second cluster node includes: moving the first cluster node into the second cluster node if 
the most likelihood increase is more than a threshold value (Yan, col. 1, lines 40-58 and 
Yan, col.3, lines 5-8). ("The tree construction is a top-down data driven process based on 
a one-step greedy tree growing algorithm. The goodness-of-split criterion is based on 
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maximum likelihood (ML) of the training data. Initially all corresponding HMM states 
of all triphones that share the same basic phone are pooled in the root node and the log- 
likelihood of the training data is calculated based on the assumption that all the states in 
the node are tied. This node is then split into two by the question that gives the 
maximum increase in log-likelihood of the training data when partitioning the states in 
the node. This process is repeated until the increase falls below a threshold", Yan, col. 1, 
lines 40-58, "During the decision tree construction, if all the data associated with all the 
states in a node is less than a threshold, the node is no longer split and becomes a leaf 
node", Yan, col.3, lines 5-8). 

5. As per claim 4, Kao, in view of Yan, teaches the speech processing method as 
claimed in claim 1, wherein the phoneme clusters are triphone clusters based on a hidden 
markov model (HMM) (Kao, col. 3, line 41; "Applicants teach to tie triphone HMMs"). 

6. As per claim 5, Kao, in view of Yan, teaches the speech processing method as 
claimed in claim 1, wherein the grouping of the phoneme clusters includes: 
grouping the triphone clusters according to answers to best phonetic context based 
questions related to the triphone clusters (Yan, col. 1, lines 36-44; "The phonetic decision 
tree is a binary tree in which a yes-no question about the phonetic context is attached to 
each node. A set of states can be recursively partitioned into subsets according to the 
answers to the questions at each node when traversing the tree from the root node to its 
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leaf nodes. All states that reach the same leaf nodes are considered similar and are 
clustered together"). 

7. As per claim 6, Kao teaches a speech processing system comprising: an input to 
receive speech signals (Kao, figure 1, subblock MIC, figure 2, subblock MIC); 
a processing unit to process received speech signals (Kao, figure 3, subblock 12 and 13), 
to generate a plurality of phoneme clusters from the processed received speech signals 
(Kao, figure 3, subblock 14), to group the phoneme clusters into a first cluster node and a 
second cluster node (Kao, figure 3, subblock 14; figure 4). 



Kao does not explicitly teach to determine automatically if a phoneme cluster in the first 
cluster node is to be moved into the second cluster node based on a likelihood increase of 
the phone cluster of the first cluster node from being in the first cluster node to being in 
the second cluster node. However, Yan teaches to determine automatically if a phoneme 
cluster in the first cluster node is to be moved into the second cluster node based on a 
likelihood increase of the phone cluster of the first cluster node from being in the first 
cluster node to being in the second cluster node (Yan, col. 1, lines 40-58 and Yan, col.3, 
lines 5-8). ('A set of states can be recursively partitioned into subsets according to the 
answers to the questions at each node when traversing the tree from the root node to its 
leaf nodes. All states that reach the same leaf nodes are considered similar and are 
clustered together. The tree construction is a top-down data driven process based on a 
one-step greedy tree growing algorithm. The goodness-of-split criterion is based on 
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maximum likelihood (ML) of the training data. Initially all corresponding HMM states 
of all triphones that share the same basic phone are pooled in the root node and the log- 
likelihood of the training data is calculated based on the assumption that all the states in 
the node are tied. This node is then split into two by the question that gives the 
maximum increase in log-likelihood of the training data when partitioning the states in 
the node. This process is repeated until the increase falls below a threshold", Yan, col. 1, 
lines 40-58, "During the decision tree construction, if all the data associated with all the 
states in a node is less than a threshold, the node is no longer split and becomes a leaf 
node", Yan, col.3, lines 5-8). 

Kao and Yan are analogous art because they are from a similar field of endeavor 
in speech processing and large vocabulary speech recognition applications. Thus, it 
would have been obvious to one of ordinary skill in the art at the time the invention was 
made to implement the teachings of Yan into Kao since Kao teaches a 
speech processing system comprising: an input to receive speech signals (Kao, figure 1, 
subblock MIC, figure 2, subblock MIC); a processing unit to process received speech 
signals (Kao, figure 3, subblock 12 and 13), to generate a plurality of phoneme clusters 
from the processed received speech signals (Kao, figure 3, subblock 14), to group the 
phoneme clusters into a first cluster node and a second cluster node (Kao, figure 3, 
subblock 14; figure 4) and Yan teaches to determine automatically if a phoneme cluster 
in the first cluster node is to be moved into the second cluster node based on a likelihood 
increase of the phone cluster of the first cluster node from being in the first cluster node 
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to being in the second cluster node (Yan, col. 1, lines 40-58 and Yan, col. 3, lines 5-8), in 
order to improve the decision-tree based acoustic modeling to better use the training data 
and thereby to improve the accuracy and robustness of the clustered acoustic models. 
("A set of states can be recursively partitioned into subsets according to the answers to 
the questions at each node when traversing the tree from the root node to its leaf nodes. 
All states that reach the same leaf nodes are considered similar and are clustered together. 
The tree construction is a top-down data driven process based on a one-step greedy tree 
growing algorithm. The goodness-of-split criterion is based on maximum likelihood 
(ML) of the training data. Initially all corresponding HMM states of all triphones that 
share the same basic phone are pooled in the root node and the log-likelihood of the 
training data is calculated based on the assumption that all the states in the node are tied. 
This node is then split into two by the question that gives the maximum increase in log- 
likelihood of the training data when partitioning the states in the node. This process is 
repeated until the increase falls below a threshold", Yan, col. 1, lines 40-58, "During the 
decision tree construction, if all the data associated with all the states in a node is less 
than a threshold, the node is no longer split and becomes a leaf node", Yan, col. 3, lines 5- 
8). 



8. As per claim 7, Kao, in view of Yan, teaches the speech processing system as 
claimed in claim 6, wherein the processing unit is to move the phoneme cluster in the 
first cluster node into the second cluster node if the first cluster node is determined to be 
moved into the second cluster node (Yan, col.3, lines 5-8). ("During the decision tree 
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construction, if all the data associated with all the states in a node is less than a threshold, 
the node is no longer split and becomes a leaf node", col.3, lines 5-8). 



9. As per claim 8, Kao, in view of Yan, teaches the speech processing system as 
claimed in claim 7, wherein the processing unit is to move the first cluster node into the 
second cluster node if the most likelihood increase is more than a threshold value (Yan, 
col. 1, lines 40-58 and col.3, lines 5-8). ("The tree construction is a top-down data driven 
process based on a one-step greedy tree growing algorithm. The goodness-of-split 
criterion is based on maximum likelihood (ML) of the training data. Initially all 
corresponding HMM states of all triphones that share the same basic phone are pooled in 
the root node and the log-likelihood of the training data is calculated based on the 
assumption that all the states in the node are tied. This node is then split into two by the 
question that gives the maximum increase in log-likelihood of the training data when 
partitioning the states in the node. This process is repeated until the increase falls below 
a threshold", Yan, col. 1, lines 40-58, "During the decision tree construction, if all the 
data associated with all the states in a node is less than a threshold, the node is no longer 
split and becomes a leaf node",Yan, col.3, lines 5-8). 



10. As per claim 9, Kao, in view of Yan, teaches the speech processing system as 
claimed in claim 6, wherein the phoneme clusters are triphone clusters based on a hidden 
markov model (HMM) (Kao, col. 3, line 41; "Applicants teach to tie triphone HMMs"). 
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11. As per claim 10, Kao, in view of Yan, teaches the speech processing system as 
claimed in claim 9, wherein the processing unit is to group the triphone clusters 
according to answers to best phonetic context based questions related to the triphone 
clusters (Yan, col. 1, lines 36-44; "The phonetic decision tree is a binary tree in which a 
yes-no question about the phonetic context is attached to each node. A set of states can 
be recursively partitioned into subsets according to the answers to the questions at each 
node when traversing the tree from the root node to its leaf nodes. All states that reach 
the same leaf nodes are considered similar and are clustered together"). 

12. As per claim 1 1 , Kao teaches a machine-readable medium that provides 
instructions, which if executed by a processor, cause the processor to perform the 
operations (Kao, col. 2, lines 15-27, figure 1) comprising: receiving speech signals 
(figure 3, subblock 1 1); processing the received speech signals (figure 3, subblock 12 and 
13); to generate a plurality of phoneme clusters (figure 3, subblock 14); grouping the 
phoneme clusters into a first cluster node and a second cluster node (figure 3, subblock 
14; figure 4). 

Kao does not explicitly teach determining automatically if a phoneme cluster in the first 
cluster node is to be moved into the second cluster node based on a likelihood increase of 
the phone cluster of the first cluster node from being in the first cluster node to being in 
the second cluster node. However, Yan teaches determining automatically if a phoneme 
cluster in the first cluster node is to be moved into the second cluster node based on a 
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likelihood increase of the phone cluster of the first cluster node from being in the first 
cluster node to being in the second cluster node (col. 1, lines 40-58 and col.3, lines 5-8). 
("A set of states can be recursively partitioned into subsets according to the answers to 
the questions at each node when traversing the tree from the root node to its leaf nodes. 
All states that reach the same leaf nodes are considered similar and are clustered together. 
The tree construction is a top-down data driven process based on a one-step greedy tree 
growing algorithm. The goodness-of-split criterion is based on maximum likelihood 
(ML) of the training data. Initially all corresponding HMM states of all triphones that 
share the same basic phone are pooled in the root node and the log-likelihood of the 
training data is calculated based on the assumption that all the states in the node are tied. 
This node is then split into two by the question that gives the maximum increase in log- 
likelihood of the training data when partitioning the states in the node. This process is 
repeated until the increase falls below a threshold", col. 1, lines 40-58, "During the 
decision tree construction, if all the data associated with all the states in a node is less 
than a threshold, the node is no longer split and becomes a leaf node", col.3, lines 5-8). 

Kao and Yan are analogous art because they are from a similar field of endeavor 
in speech processing and large vocabulary speech recognition applications. Thus, it 
would have been obvious to one of ordinary skill in the art at the time the invention was 
made to implement the teachings of Yan into Kao since Kao teaches a speech processing 
method comprising: receiving speech signals (figure 3, subblock 1 1); processing the 
received speech signals (figure 3, subblock 12 and 13); to generate a plurality of 
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phoneme clusters (figure 3, subblock 14); grouping the phoneme clusters into a first 
cluster node and a second cluster node (figure 3, subblock 14; figure 4) and Yan teaches 
determining automatically if a phoneme cluster in the first cluster node is to be moved 
into the second cluster node based on a likelihood increase of the phone cluster of the 
first cluster node from being in the first cluster node to being in the second cluster node 
(col. 1, lines 40-58 and col.3, lines 5-8), in order to improve the decision-tree based 
acoustic modeling to better use the training data and thereby to improve the accuracy and 
robustness of the clustered acoustic models. ("A set of states can be recursively 
partitioned into subsets according to the answers to the questions at each node when 
traversing the tree from the root node to its leaf nodes. All states that reach the same leaf 
nodes are considered similar and are clustered together. The tree construction is a top- 
down data driven process based on a one-step greedy tree growing algorithm. The 
goodness-of-split criterion is based on maximum likelihood (ML) of the training data. 
Initially all corresponding HMM states of all triphones that share the same basic phone 
are pooled in the root node and the log-likelihood of the training data is calculated based 
on the assumption that all the states in the node are tied. This node is then split into two 
by the question that gives the maximum increase in log-likelihood of the training data 
when partitioning the states in the node. This process is repeated until the increase falls 
below a threshold", col. 1, lines 40-58, "During the decision tree construction, if all the 
data associated with all the states in a node is less than a threshold, the node is no longer 
split and becomes a leaf node", col.3, lines 5-8). 
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13. As per claim 12, Kao, in view of Yan, teaches the machine -readable medium of 
claim 11, further providing instructions, which if executed by a processor, cause the 
processor to perform the operations comprising: moving the phoneme cluster in the first 
cluster node into the second cluster node if the first cluster node is determined to be 
moved into the second cluster node (Yan, col. 3, lines 5-8). ("During the decision tree 
construction, if all the data associated with all the states in a node is less than a threshold, 
the node is no longer split and becomes a leaf node", col.3, lines 5-8). 

14. As per claim 13, Kao, in view of Yan, teaches the machine-readable medium of 
claim 12, further providing instructions, which if executed by a processor, cause the 
processor to perform the operations comprising: moving the first cluster node into the 
second cluster node if the most likelihood increase is more than a threshold value (Yan, 
col. 1, lines 40-58 and col.3, lines 5-8). ("The tree construction is a top-down data driven 
process based on a one-step greedy tree growing algorithm. The goodness-of-split 
criterion is based on maximum likelihood (ML) of the training data. Initially all 
corresponding HMM states of all triphones that share the same basic phone are pooled in 
the root node and the log-likelihood of the training data is calculated based on the 
assumption that all the states in the node are tied. This node is then split into two by the 
question that gives the maximum increase in log-likelihood of the training data when 
partitioning the states in the node. This process is repeated until the increase falls below 
a threshold", Yan, col. 1, lines 40-58, "During the decision tree construction, if all the 
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data associated with all the states in a node is less than a threshold, the node is no longer 
split and becomes a leaf node",Yan, col.3, lines 5-8). 

15. As per claim 1 4, Kao, in view of Yan, teaches the machine -readable medium of 
claim 1 1 , further providing instructions, which if executed by a processor, cause the 
processor to perform the operations comprising: processing the received speech signals to 
generate a plurality of phoneme clusters that are triphone clusters based on a hidden 
markov model (HMM) (Kao, col. 3, line 41; "Applicants teach to tie triphone HMMs"). 

16. As per claim 15, Kao, in view of Yan, teaches the machine-readable medium of 
claim 14, further providing instructions, which if executed by a processor, cause the 
processor to perform the operations comprising: grouping the triphone clusters according 
to answers to best phonetic context based questions related to the triphone clusters (Yan, 
col. 1, lines 36-44; "The phonetic decision tree is a binary tree in which a yes-no question 
about the phonetic context is attached to each node. A set of states can be recursively 
partitioned into subsets according to the answers to the questions at each node when 
traversing the tree from the root node to its leaf nodes. All states that reach the same leaf 
nodes are considered similar and are clustered together"). 
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Conclusion 

17. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to LUIS A. SALAZAR whose telephone number is 
(571)270-5250. The examiner can normally be reached on Monday-Friday. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Pankaj Kumar can be reached on (571)272-6000. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published 
applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business 
Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 
Customer Service Representative or access to the automated information system, call 
800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

LS 

/Pankaj Kumar/ 

Supervisory Patent Examiner, Art Unit 4192 



